Avoiding Overfitting, Pt. 2
Penalized Regression

Behavioral Data Science in R II
Unit 2
Module 6

Regularization

Techniques to avoid overfitting by increasing model bias in exchange for decreased variance.

Penalized Regression

Mean Squared Error Loss Function: \[ \frac{1}{n}\sum_{i}^{N}(y_i - f(x_i))^2 \]

L1 Regularization (Lasso): \[ \frac{1}{n}\sum_{i}^{N}(y_i - f(x_i))^2 + \sum_{j} |\beta_j| \]

L2 Regularization (Ridge): \[ \frac{1}{n}\sum_{i}^{N}(y_i - f(x_i))^2 + \sum_{j} \beta_j^2 \]

L1 (Lasso) vs L2 (Ridge)

Elastic Net

Human Activity Recognition Data

dim(df)
[1] 6471  113

Human Activity Recognition Data

No penalty

mnr_spec <- multinom_reg(
  mode="classification",
  engine="glmnet",
  penalty=0
)

mnr_wf <- workflow() %>% 
  add_recipe(rec) %>% 
  add_model(mnr_spec) 

mnr_fit <- mnr_wf %>% 
  fit(train) 

Train accuracy:

[1] 0.9559387

Test accuracy:

[1] 0.7126437

Tuning the L1 Penalty

mnr_spec_tune <- multinom_reg(
  mode="classification",
  engine="glmnet",
  penalty = tune(),
  mixture = 1
)

folds = vfold_cv(train,v=5)

param_grid = grid_regular(penalty(), levels=50) 

tune_wf <- workflow() %>%
  add_recipe(rec) %>% 
  add_model(mnr_spec_tune)

tune_rs <- tune_grid(
  tune_wf,
  folds,
  grid = param_grid,
  metrics = metric_set(accuracy)
)

Train accuracy:

[1] 0.8678161

Test accuracy:

[1] 0.7643678

Shrinkage of estimates

No Penalty

L1 (Lasso)

L2 (Ridge)

When to use each:

  • Ridge:
    • Multicollinearity is a concern
    • Number of features is not a major concern
  • Lasso:
    • You want to find a few key features and eliminate irrelevant ones
  • Elastic:
    • Overfitting and multicollinearity are concerns